Main Findings

Column

Scatterplot: Musical Sophistication and Valence Estimation Error

Column

Interactive Experiment: Intro

We challenge you to beat our participants!

Grab a pen and paper! In this experiment demo you get to experience what our research is about in an interactive way. In the following tabs we have included short versions of the different parts in our survey.

First, you get to make a mini Gold-MSI test. The result places you on the musical sophistication scale. The higher your score, the more knowledge and experience you have with music.

The second part consists of a few songs that we used in the survey. It is up to you to determine the valence value of each song to the best of your ability! For both the Gold-MSI test and the song estimation test, the results are on the subsequent tab.

How to grade yourself is explained on the answer tabs. After you have determined your scores, you can see in the graph where you would have approximately been in our sample and if you have surpassed the participants. Good luck!

Mini-MSI

I spend a lot of my free time doing music-related activities.

1 — 2 — 3 — 4 — 5 — 6 — 7

I can compare and discuss differences between two performances or versions of the same piece of music.

1 — 2 — 3 — 4 — 5 — 6 — 7

When I sing, I have no idea whether I’m in tune or not.

1 — 2 — 3 — 4 — 5 — 6 — 7

At the peak of my interest, I practiced ___ hours per day on my primary instrument.

0 — 0.5 — 1 — 1.5 — 2 — 3-4 — 5 or more

Musical Fragments

What do you think is the valence of these four musical fragements? These values range from 1 to 7, with 1 meaning extremely low valence, 7 - extremely high, and 4 - neither high nor low.

SongID: 40

SongID: 80

SongID:155

SongID: 202

Answers and Grading

First, the Mini-MSI results.

Each answer is worth a maximum of 7 points. For question 1, 3 and 4, the first option is worth 1 point, the second 2 points,… For question 2, the scores are reversed, with the first option being worth 7 points, the second 6 points,… Multiply your score by 4,5 to see what your score would’ve roughly been on the full test.

For the music we give the correct valence values. It is up to determine how far you were off or whether you were exactly correct.

Song 40:
  True valence: 4
  Average estimate: 5,5
  Times rated correct / Times tested: 1/10


Song 80:
  True valence: 7
  Average estimate: 4,63
  Times rated correct / Times tested: 0/8


Song 155:
  True valence: 2
  Average estimate: 2,78
  Times rated correct / Times tested: 4/9


Song 202:
  True valence: 3
  Average estimate: 3,11
  Times rated correct / Times tested: 3/9


On the following tab you can view the raw data of our participants, and you can see how well they did on the songs you listened to by searching for the songs’ IDs (40, 80, 155, 202) in the searchbar

Background

Column

Introduction

Nowadays, music is more present in our lives than ever before. According to the International Federation of the Phonographic Society (2018), an average person spends around 18 hours a week listening to music. The majority of this time is spent on streaming platforms like Spotify, which have displayed a consistent growth in revenue over the past decade. The fact that music has become omnipresent in our lives raises the question whether this makes people better at identifying and assessing certain musical features. In other words, is there a difference between how normal and musically sophisticated individuals perceive features of music? If this would turn out to be true, this would indicate that the way we think about music, in fact, depends on our exposure to it.

Although this question has not been addressed directly in the academic literature, researchers have investigated how differences in musical engagement between people lead to differences in the perception of musical valence and arousal, with valence being defined as the positiveness of a track (Frijda, 1986), and arousal as the state of being alert or awake (Warriner et al., 1986). In their research, Olsen and colleagues (2014) conclude that such a difference exists: the variable musical engagement is a significant predictor of perceived musical arousal and valence. However, the relationship between musical sophistication and perceived musical features has not been investigated by academics at the time of writing this paper.

To fill this gap in the literature, we decided to investigate the following research question in our study: are musically sophisticated people better at estimating a musical piece’s valence? We specifically chose to scrutinize the relationship between musical sophistication and valence, as valence is a clearly defined, well studied emotional dimension in psychology. To answer our question, we make use of surveys in which we ask our participants to estimate the valence of 25 randomly selected songs. We subsequently compare their outcomes not only between each other, but also to Spotify’s valence rating of the song: our point of reference used by the world’s biggest streaming platform (Spotify, 2018).

Raw data

Column

Materials

Song pool: the song pool contains 240 songs, with 20 songs per genre. The genres include: pop, rock, metal, electronic, dance, house, hip-hop, singer/songwriter, soundtrack, R&B, soul/blues and classical music.

Randomized playlist: from the song pool, 25 songs are randomly selected for each participant to listen to and rate.

Gold-MSI: a questionnaire used to measure musical sophistication. Calculated as total number of points on the measure. Only the short version of the questionaire is used, i. e., only the items that map onto the general musical sophistication factor.

Trial round: prior to the main valence estimation task, participants are presented with two songs which they rate the valence of, and are then provided with the true valence of said songs. This is used to help the participants better conceptualize what valence is, and what the main task will look like.

Valence rating: a 7-point Likert scale, ranging from “Extremely low” (1) to “Extremely high” (7), used to estimate a song’s valence.

Familiarity measure: a measured used to indicate whether the participant is familiar with the song that they are rating. This is used in later analysis to see if familiarity affects participant’s judgements of a song’s valence. Measured as 1 (familiar) or 0 (unfamiliar).

R Shiny web app: the web app is used to construct a randomized playlist from the song pool, administer the Gold-MSI, and collect valence and familiarity data.

Song selection

In total, 242 songs were used in this experiment.

Two songs were the same for each participant, namely the trial round songs. These were selected to help participants better conceptualize musical valence. The first song had an extremely high valence value (7), whereas the second song was closer to the middle of the scale and had a slightly low value (3).

The selection process for the other 240 songs began with a breakdown of musical genres - we chose 12 genres based on a survey that was conducted using a sample of 19000 people between the age of 16-64, measuring the 12 most consumed genres of music in 18 countries. (IFPI, 2018). For each genre 20 songs were selected making use of the website ‘rateyourmusic.com’, where albums can be sorted by their average user-submitted rating, by genre. We sampled songs from these highly-rated albums, working under the assumption that well-regarded albums in a given genre are the most representative of said genre. Each sample of songs contained only one song per artist, in a given genre.

During the selection process, we also took note of the valence of the sampled songs. We made sure that - per genre - no valence value was overrepresented or underrepresented, given what is usual for a specific genre; e. g., if a genre is generally characterized as having songs with higher valence, the song pool for said genre would be skewed towards, on average, higher values as well (compared to a song pool of a genre that is characterized as having lower valence songs).

Once the songs were selected, we trimmed each musical item’s length. As a rule of thumb, we chose the 15 seconds from one minute of playback into the song. Some of the selected songs started with an intro, however, this did not count towards the minute.

Procedure

Participants begin the experiment by completing the Gold-MSI to collect measures for musical sophistication. Participants are then directed to the main experiment, where they are first presented with a practice round. In the practice round, participants listen to two songs, and are then asked to rate each song’s valence. After each rating in the practice round, participants are presented with the song’s true valence as determined by Spotify. After this, participants proceed to the main task, which contains the randomized playlist. After listening to a song from the playlist, participants rate the valence using a 7-point Likert scale, and indicate whether they are familiar with the song.

Exploration

Row

Violin: Valence Estimation Error Density

Jitter: Musical Sophistication & Valence Estimation by Genre

Scatter: True Valence & Estimated Valence

Bar: Total Estimated Valence Error by Genre

Row

Violin Interpretation

The violin graphs depict the density of estimated valence error responses per genre. This plot consists of two parts: one is the box plot, the other is the ‘violin’. From the box plot we can see the minimum and maximum responses, as well as the first quartile, median and third quantile. The thickness of the violin indicates the number of observations for a particular valence deviation.

From this graph we can deduct whether the valences in particular genres were systematically being over- or underestimated by participants. We see that for five genres this is not the case: Electronic, Hip-Hop, Pop, R&B and Singer/Songwriter all have a median of 0. For the other genres however, the medians range from -1 to +2. The most significant being Dance and House with an average of two points estimated above the true valence. For Soundtracks we see that the first quantile and median are equal to +1, but the third quantile is +4.

Jitter Interpretation

In this jitter plot, the valence estimation error (i.e. the difference between the true valence and a participant’s estimation) is plotted against the Gold-MSI score and grouped per music genre. We made use of linear regression to investigate the potential relationship between the dependent and independent variable per genre. It is important to note that, as opposed to the plot on the main page, every dot in this graph represents one attempt of estimating the valence, not one participant. Because this means that we have multiple observations per MSI-value, the command geom_jitter() was used to avoid a straight line of points and make the graph more intuitive to interpret.

As becomes evident from looking at the graph, there seems to be no significant negative relationship between the valence estimation error and musical sophistication for any music genre like we hypothesized. Only the fitted lines for the genres pop, soul/blues and R&B show a very small negative slope. In fact, the genres hip-hop, rock and house even display a small positive relationship. Another interesting finding is that the genres house, dance and soundtracks seem to be systematically overvalued, as indicated by the fitted lines well above 0. This is in accordance with the findings from the violin plot and constitutes an interesting topic for further research.

Scatter Interpretation

As can be seen, some of the genres have similar graphs, but overall, the graphs differ much from each other. A single graph in which all genres are included would not lead to a concise result, therefore we opted for genre-specific graphs. For most genres there is a positive trend, which is most accurately shown by the data for R&B and Singer-Songwriter. Another noteworthy result is that for Dance and House the lower valence values are overestimated on average, for Metal the higher valence values are underestimated on average. For Electronic, Hip-Hop and Rock most average estimates are around the central value, which either indicated a central tendency response pattern, or that for these genres the positivity-grade is not clearly distinguishable.

What also can be seen is that for most genres, songs with extreme-valued valences, so 1 or 7, are not recognized by all participants the song was tested on. These results raise the question whether the valence values determined by Spotify are actually representative for the valence or positivity it evokes in people. The Soundtracks graph raises the same question, because Spotify determined all but one track to have valence value 1 or 2, but the on average judgement by the participants is very spread out.

Bar Interpretation

The bar charts depict the total valence error per genre. The bars are centered around 0, so it is possible to evidently see whether a genre is generally over- or underestimated. Each ‘slice’ in a bar represents an observation of a song in that genre. Moreover, The bars are colored by the estimated valence error, and the thickness of each ‘slice’ correspons to this error, e. g., if a song in the Rock genre was overvalued by 5, then the song’s ‘slice’ is placed in the positive half of the column and has a thickness of 5 units. This means that if an observation was correct, i. e., had an error value of 0, then that slice is not depicted in the graph - this makes it easier to visually inspect whether a genre is generally over- or underestimated.

In accordance with the violin and jitter plots, it seems that songs in the Classical, Dance, House and Soundtracks genres were systematically overestimated. Furthermore, a valence error of 4-6 comprised an evidently larger part of the Classical, Dance, House and Soundtracks genres. When looking at both the scatterplot and bar chart, this suggests that Classical and Soundtracks are underestimated by Spotify, and that Dance and House are overestimated by people (this might be due to the energetic nature of songs in these genres).

Discussion

Column

General conclusion

The present study sought to investigate the relationship between musical sophistication and valence estimation error by comparing people’s valence estimates against values provided by Spotify, while relating this to people’s musical sophistication. Counter to the main hypohtesis, the results showed no negative relationship between musical sophistication and valence estimation error. This indicates that musical sophistication is not a predictor of accurate perception of musical valence.

Subsequently, we conducted exploratory research to gain insights into relationships that we initially did not hypothesize. It seems that the only genre of music where our hypothesis holds is Soul/Blues; however, since we investigated 12 genres, it is plausible that the negative relationship observed is simply a statistical fluke. More interestingly, from multiple graphs, it can be seen that fragments in the Classical, House, Dance and Soundtracks genres were overestimated by our sample. It is plausible that this might be because the valences used by Spotify in these genres are undervalued, or because participants’ ratings of these fragments were confounded by some other features of the tracks. Since these insights are of an exploratory nature though, they should not be taken as concrete conclusions; rather, they should be taken as advice for future research into the topic of musical sophistication and the identification of musical features.

References and repository

Frijda, N. H. (1986). The emotions. Cambridge: Cambridge University Press.

International Federation of the Phonographic Industry. (2018). IFPI Global Music Report 2019. Retrieved from: https://www.ifpi.org/news/IFPI-GLOBAL-MUSIC-REPORT-2018

Spotify. (2018). Spotify Technology S.A. Announces Financial Results for First Quarter 2018. Retrieved from: https://investors.spotify.com/financials/press-release-details/2018/Spotify-Technology-SA-Announces-Financial-Results-for-First-Quarter-2018/default.aspx

Warriner, A. B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior research methods, 45(4), 1191-1207.

https://github.com/knrds/everydayMusicListening

Column

Limitations

Sample size

Our survey was completed by 42 participants. Because of the randomized and unique set of fragments that each participant listened to, this means that some of the fragments had relatively few observations, with 3 tracks having no observations entirely. While this does not influence the main analysis (as each participant’s average score was used there), this does influence the relationship between songs’ true valences provided and their estimated valences, as some of the datapoints in the graph plotting this relationship had nine observations, whereas some only had one.

Survey limitations

Musical fragments were only played once when participants took the survey. Because of this, if a participant failed to attend to a given fragment, they still had to rate its’ valence with the limited insight they had. Furthermore, after receiving feedback from some participants, we realized that not all of the fragments were of an equal volume, and that some of the participants changed the volume multiple times during the survey - this also could have interfered with participants’ attentiveness during the survey. This means that some of the valence ratings could have been different, and maybe even more accurate, if participants had had the capability to replay songs.

Sample representativeness

Our sample mainly consisted of undergraduates studying in the Netherlands. This likely only forms a particular subset of the sample that Spotify utilizes for computing the various track features it offers, including valence. Therefore, if we had utilized a sample represenatitive of the population that uses Spotify (and not a subset of it), it’s plausible that the valence estimates we observed would be more in line with those provided by Spotify.

Future research

The research we conducted raises several questions which could be addressed in further academic research. First of all, it would be of particular interest to verify whether our finding that musically sophisticated individuals are no better at estimating the valence of a song holds for other music features as well. It would also be interesting to find out how our conclusion changes if more objectively measurable features of music were used, such as key, mode (major vs. minor) or tempo. If the results of such research would correspond to our findings, this would imply more everyday music exposure does not make us better at identifying music characteristics in any way.

Second, the fact that the valences of songs in the Classical, House, Dance and Soundtracks genres were systematically overvalued in our sample (if we are willing to accept the assumption that Spotify’s valence values can be regarded as true) makes one wonder about the possible causes of this empirical observation. For instance, is it difficult to assess the valence of these genres because these songs often contain no or very little vocals that convey a message? Would people rate the valences of songs differently depending on whether they contain vocals? Or, as House and Dance are genres with a relatively high tempo, might the overvaluation be linked to the tempo of the songs? These are all questions that remain to be answered yet.